Methods and systems for malware analysis

ABSTRACT

Methods, system, and media for analyzing a potential malware sample are disclosed. A sample for malware analysis may be received. The sample may be received through a web interface. The sample may be analyzed using a plurality of analyzers implemented on one or more computing devices. The analyzers may perform a sequence of configurable analytic steps to extract information about the sample. The extracted information may be displayed to a user through the web interface.

CROSS-REFERENCE TO REPLATED APPLICATION

This Application is a continuation of U.S. application Ser. No.14/068,605, filed Oct. 31, 2013, the contents of which are incorporatedby reference.

FIELD

This disclosure relates generally to malware analysis, and moreparticularly to methods, system, and media for malware analysis.

BACKGROUND

Existing malware analysis services suffer from several deficiencies.First, some of these services, although competent for some malwarethreats, are not enough to combat a malware infection. One cannot justrely on a sandbox to determine what a piece of malware has done. Second,several previous attempts are built to target only a single type ofmalware or platform, e.g. Microsoft® Windows®. Yet, malware is oftenplatform agnostic, and can target multiple platforms. Third, some ofthese services do not produce output understandable to anyone beyondthose with specialized training, e.g. a degree in Computer Science. Thislimits the usefulness of these services to users that do not possess thespecialized training.

What is needed is a design such that as malware threats change andevolve, the analysis conducted by the various processing elements canchange and evolve as well.

SUMMARY

Various embodiments are generally directed to malware analysis toovercome the aforementioned problems.

One or more embodiments may include a method for analyzing a potentialmalware sample, the method comprising: receiving a sample for malwareanalysis through a web interface; analyzing the sample using a pluralityof analyzers implemented on one or more computing devices, wherein theanalyzers perform a sequence of configurable analytic steps to extractinformation about the sample; and displaying the extracted informationto a user through the web interface.

One or more embodiments may include a system comprising: a memory; and aprocessor coupled to the memory, the processor being configured to:receive a sample for malware analysis through a web interface; analyzethe sample using a plurality of analyzers implemented on one or morecomputing devices, wherein the analyzers perform a sequence ofconfigurable analytic steps to extract information about the sample; anddisplay the extracted information to a user through the web interface.

One or more embodiments may include a computer readable storage mediumcomprising instructions that if executed enables a computing system to:receive a sample for malware analysis through a web interface; analyzethe sample using a plurality of analyzers implemented on one or morecomputing devices, wherein the analyzers perform a sequence ofconfigurable analytic steps to extract information about the sample; anddisplay the extracted information to a user through the web interface.

These and other features and advantages will be apparent from a readingof the following detailed description and a review of the associateddrawings. It is to be understood that both the foregoing generaldescription and the following detailed description are explanatory onlyand are not restrictive of aspects as claimed.

BRIEF DESCRIPTION OF THE DRAWINGS

Embodiments will now be described in connection with the associateddrawings, in which:

FIG. 1 depicts a block diagram of an exemplary system in accordance withone or more embodiments.

FIG. 2 depicts a block diagram of an exemplary system in accordance withone or more embodiments.

FIG. 3 depicts a block flow diagram of an exemplary method in accordancewith one or more embodiments.

FIG. 4 depicts an exemplary workflow editor in accordance with one ormore embodiments.

FIG. 5-1 depicts a block diagram of an exemplary system in accordancewith one or more embodiments.

FIG. 5-2 depicts an example of custom rules in accordance with one ormore embodiments.

FIG. 5-3 depicts an exemplary analytic summary in accordance with one ormore embodiments.

FIG. 6 depicts an exemplary interface in accordance with one or moreembodiments.

FIG. 7 depicts an exemplary architecture for implementing a computingdevice in accordance with one or more embodiments.

FIG. 8 is an exemplary embodiment of the invention depicting an exampleworkflow where malware analyzers are run in a specified sequence.

DETAILED DESCRIPTION OF THE DRAWINGS

Exemplary embodiments are discussed in detail below. While specificexemplary embodiments are discussed, it should be understood that thisis done for illustration purposes only. In describing and illustratingthe exemplary embodiments, specific terminology is employed for the sakeof clarity. However, the embodiments are not intended to be limited tothe specific terminology so selected. A person skilled in the relevantart will recognize that other components and configurations may be usedwithout parting from the spirit and scope of the embodiments. It is tobe understood that each specific element includes all technicalequivalents that operate in a similar manner to accomplish a similarpurpose. The examples and embodiments described herein are non-limitingexamples.

A system, method, medium, or computer-based product may provide tools toassist analysts and computer incident responders when analyzing malware.The system, method, medium, or product may be designed to reduce theamount of effort required to analyze and reverse engineer malware. Itmay help to identify the malware, what the malware did to a system, whatthe malware could have done, how one knows if the malware ran on one ormore systems, and how one removes the malware from a system. The system,method, medium, or product may combine an expandable set of machinelearning algorithms and rule sets for automated analysis, adaptors forexternal analytics, a workflow management framework for processing andreporting, and a web-based user interface.

The system, method, medium, or product can substantially increase thework productivity of malware analysts and computer incident responders.The system, method, medium, or product may provide user, e.g. novice andintermediate level security experts, with the tools to perform at expertlevels and with much greater efficiency. The system, method, medium, orproduct can be deployed as a stand-alone tool or can be integrated intoan existing automated workflow.

FIG. 1 depicts a block diagram of an exemplary system 100 in accordancewith one or more embodiments. System 100 may include one or more userdevices, e.g. user device 120-1, user device 120-2, and user device120-3, network 130, server 150, database 155, software module 165, andserver 180.

The one or more user devices, e.g. user device 120-1, user device 120-2,and user device 120-3 may any type of computing device, including amobile telephone, a laptop, tablet, or desktop computer, a netbook, avideo game device, a smart phone, an ultra-mobile personal computer(UMPC), etc. The one or more user devices may run one or moreapplications, such as Internet browsers, voice calls, video games,videoconferencing, and email, among others. The one or more user devicesmay be any combination of computing devices. These devices may becoupled to network 130.

Network 130 may provide network access, data transport and otherservices to the devices coupled to it. In general, network 130 mayinclude and implement any commonly defined network architecturesincluding those defined by standards bodies, such as the Global Systemfor Mobile communication (GSM) Association, the Internet EngineeringTask Force (IETF), and the Worldwide Interoperability for MicrowaveAccess (WiMAX) forum. For example, network 130 may implement one or moreof a GSM architecture, a General Packet Radio Service (GPRS)architecture, a Universal Mobile Telecommunications System (UMTS)architecture, and an evolution of UMTS referred to as Long TermEvolution (LTE). Network 130 may, again as an alternative or inconjunction with one or more of the above, implement a WiMAXarchitecture defined by the WiMAX forum. Network 130 may also comprise,for instance, a local area network (LAN), a wide area network (WAN), theInternet, a virtual LAN (VLAN), an enterprise LAN, a layer 3 virtualprivate network (VPN), an enterprise IP network, or any combinationthereof.

Server 150 or server 180 may also be any type of computing devicecoupled to network 130, including but not limited to a personalcomputer, a server computer, a series of server computers, a minicomputer, and a mainframe computer, or combinations thereof. Server 150or server 180 may be a web server (or a series of servers) running anetwork operating system, examples of which may include but are notlimited to Microsoft Windows Server, Novell NetWare, or Linux. Server150 or server 180 may be used for and/or provide cloud and/or networkcomputing. Although not shown in FIG. 1, server 150 and or server 180may have connections to external systems like email, SMS messaging, textmessaging, ad content providers, etc. Any of the features of server 150may be also implemented in server 180 and vice versa.

Database 155 may be any type of database, including a database managedby a database management system (DBMS). A DBMS is typically implementedas an engine that controls organization, storage, management, andretrieval of data in a database. DBMSs frequently provide the ability toquery, backup and replicate, enforce rules, provide security, docomputation, perform change and access logging, and automateoptimization. Examples of DBMSs include Oracle database, IBM DB2,Adaptive Server Enterprise, FileMaker, Microsoft Access, Microsoft SQLServer, MySQL, PostgreSQL, and a NoSQL implementation. A DBMS typicallyincludes a modeling language, data structure, database query language,and transaction mechanism. The modeling language is used to define theschema of each database in the DBMS, according to the database model,which may include a hierarchical model, network model, relational model,object model, or some other applicable known or convenient organization.Data structures can include fields, records, files, objects, and anyother applicable known or convenient structures for storing data. A DBMSmay also include metadata about the data that is stored.

Software module 165 may be a module that is configured to send, process,and receive information at server 150. Software module 165 may provideanother mechanism for sending and receiving data at server 150 besideshandling requests through web server functionalities. Software module165 may send and receive information using any technique for sending andreceiving information between processes or devices including but notlimited to using a scripting language, a remote procedure call, anemail, a tweet, an application programming interface, Simple ObjectAccess Protocol (SOAP) methods, Common Object Request BrokerArchitecture (CORBA), HTTP (Hypertext Transfer Protocol), REST(Representational State Transfer), any interface for software componentsto communicate with each other, using any other known technique forsending information from a one device to another, or any combinationthereof.

Although software module 165 may be described in relation to server 150,software module 165 may reside on any other device. Further, thefunctionality of software module 165 may be duplicated on, distributedacross, and/or performed by one or more other devices, either in wholeor in part.

FIG. 2 depicts a block diagram of an exemplary system 200 in accordancewith one or more embodiments. System 200 may provide a workflowmanagement system for the automated, collaborative analysis, and/orreverse engineering of malware. System 200 may combine an expandable setof machine learning algorithms and rule sets for automated analysis,adaptors for external analytics, a workflow management framework forprocessing and reporting, and a web-based user interface. System 200 maybe implemented on system 100. For example, the software modules may beimplemented by software module 165, and any information may be stored indatabase 155.

A user 210 may utilize system 200. System 200 may include one or morehoneypots, e.g., honeypot 215-1, honeypot 215-2, and honeypot 215-3,threat navigation module 220, data bridge 225, workflow manager 230,analysis manager 235, one or more analyzers, e.g. 240-1, 240-2, and240-3, one or more environments, e.g. 245-1, 245-2, and 245-3, results250, and web interface 255.

FIG. 3 depicts a block flow diagram of an exemplary method 300 inaccordance with one or more embodiments. Although exemplary method 300will be discussed in conjunction with system 200, exemplary method 300is not limited to execution on system 200, and may be implemented by anysystem capable of performing or being configured to perform exemplarymethod 300.

In block 310, a sample for malware analysis may be received. User 210,one or more honeypots, or any combination thereof, may submit one ormore samples, e.g. files, binary files, etc., to initiate malwareanalysis. Samples may also be received via a data feed 211. In someinstances, samples may be automatically collected and submitted via datafeed 211. The samples may be submitted via a web interface. The one ormore honeypots, e.g., honeypot 215-1, honeypot 215-2, and honeypot215-3, may refer to a trap set to detect, deflect, or in some mannercounteract attempts at unauthorized use of information systems. User 210may be any user of system 200.

Threat navigation module 220 may receive one or more samples, which mayinitiate a series of automated, configurable analytic steps, which mayinclude application of machine learning models for signature-freeassessment of threat severity, as well as external static and dynamicanalytics, including file hashing, comparison against public or privatewhitelists/blacklists, and storage of ingested files and their resultingmetadata, or any combination thereof. The threat navigation module 220may be responsible for preprocessing the sample before entry into thedata bridge 225. Results of the preprocessing step may assist the systemin determining initial workflows. Examples of preprocessing are:uncompressing a sample, decrypting a sample, and identifying the filetype. Attributes such as the file type may affect the workflow bydetermining the analyzers that are applicable to the sample. Thus, theanalyzers used in a workflow may be assigned based on the results of thepreprocessing.

The files may be forwarded to a data bridge 225 for storage in a samplerepository. Data bridge 225 and/or the sample repository may beimplemented by database 165. Samples captured by honeypots may bepresented to a threat navigation module 220 and forwarded to the databridge 225 for storage.

Workflow manager 230 may leverage high availability and fault tolerantcomputer technologies to scale in processing power as the user baseexpands. Workflow manager 230 can easily integrate new analyzers whilegiving users the ability to not only schedule new workflows but alsostop existing workflows from the administrative interface of the system.This is all done without having to shut down or redeploy the system 200.

Users may be able to create and/or modify existing workflows by invokinga workflow editor and selecting the desired analyzers. FIG. 4 depicts anexemplary workflow editor 400 in accordance with one or moreembodiments. Workflow editor may be presented to a user (e.g. user 210)on a user device (e.g. one or more of user devices 120-1, 120-2, or120-3). Workflow editor 400 may be presented as a web page, by anapplication, or any combination thereof. Using workflow editor 400, auser may select one or more analyzers to run in the workflow. Forexample, in FIG. 4, the BinaryFeatureExtractor, CLAM_AV, FindStrings,and ModelScoringEngine analyzers have been selected. Any analyzer may belisted and/or selected by the user. A user may also specify an order inwhich to apply the selected analyzers, e.g. by using workflow editor400. A user may also specify one or more workflow options. For example,a user may specify whether or not a workflow will support and/or usevirtual machines, whether or not a workflow will support scripts, or anyother user selectable option associated with a workflow.

Referring back to FIG. 3, in block 320, once the samples are stored,workflow manager 230 may invoke an analysis manager 235, which mayinvoke one or more analyzers, e.g. 240-1, 240-2, and 240-3, that performa sequence of configured analytic steps to extract information about thesample. The analysis manager 235 may be pre-configured to follow aspecific sequence created by default or a sequence generated by theuser. In some embodiments, analysis manager 235 may have control only ofone or more data analyzers, whereas the workflow manager 230 may have awider influence on the sequence of system actions.

An analyzer, e.g. analyzers 240-1, 240-2, and 240-3, may refer to adiscrete program, script, or environment designed to process a piece ofmalware in some manner to extract some useful piece of informationwithin or metadata about the malware. The analyzer may be provided witha complete API of functions for storage, extraction, processing andreporting on malware. An API, such as a RESTful interface, may be usedto make the extracted information available to other computing devicesand to upload the file of potential malware. An analyzer may beimplemented in any programming language, e.g. in Python and Javaimplementations, and may be developed for implementation on anyoperating system, e.g. Linux, OS X, Windows, etc. However, theanalyzers, regardless of implementation, may all integrate with theapplication programming interface.

The system may be capable of recursive analysis, in which eachanalytical outcome could reveal more information to invoke moreanalyzers. For example, a first analyzer may be run and produce a firstanalytical outcome as a result of the execution. The first analyzer mayrun a second analyzer, e.g. another analyzer different from the firstanalyzer or even the same first analyzer, to process the firstanalytical outcome. The first analyzer may call the second analyzerbefore or after completing its own analysis. The first analyzer may usethe results of the run of the second analyzer when performing itsanalysis.

The analyzers performing a sequence of configured analytic steps mayinclude forwarding the sample to one or more environments, e.g. 245-1,2-45-2, and 245-3, for execution and behavioral profiling. The one ormore environments may include a sandbox environment for execution andbehavior profiling. The one or more environments may include hardwareconfigurations, to which samples may be sent for processing.

Instructions to and results from the analyzers may be passed via aheterogeneous set of messaging mechanisms.

FIG. 5-1 depicts one or more analyzers 520, 521, and 522 processingbinary sample 510 (suspected malware) and interrogating the ruleknowledgebase 540 (via the rules engine 530) to extract knowledge toproduce classification, observations, and conclusions that are presentedto the user as an analytic summary 501. The analytic summary 501 may bea conversion of technical data into actionable data points that can beconsumed by users of the system, e.g. novice users of the system. Therule knowledgebase may be updated as new rules are developed. FIG. 5-2depicts an example of a rule.

FIG. 5-3 depicts an exemplary analytic summary 501 in accordance withone or more embodiments. Analytic summary 501 may include severalexamples of the actionable data or one or more analyzed samples. Forexample, analytic summary 501 includes the actionable data “The targetwas observed installing a function hook for all desktop programs tointerrupt all graphical actions (e.g. mouse clicks, menu options, newwindows, etc.). (70%).” From this actionable data, a user may gain anunderstanding of the behavior of the sample, and determine whether ornot to pursue further action, e.g. removing the malware, alertingsomeone about the malware, etc. Actionable data may include a percentagewhich indicates the system's confidence level.

One or more analyzers may leverage machine learning technology toautomatically classify each submitted sample and attempt to determine ifthe sample is malware or not without requiring any antivirus signatures.

Referring back to FIG. 3, results from the analyzers may be stored, and,once analysis is complete, the results may be presented at the userinterface as a report. In block 330, results of the analysis 250 may bedisplayed to the user in the web interface 255. The results may beinformation extracted about the sample during the analysis. As shown inFIG. 5-3, the results may be a clear, concise and simple explanationabout the malware submitted, and may include everything from complexclassification to basic, to high-level conclusions (“What is it?”), andeven suggestions for further proof or remediation of the target or anycombination thereof. The output may be designed to be user friendly toanyone from a newly hired junior system administrator to an executivelevel user responsible for thousands of machines. As discussed above,FIG. 5-3 depicts an exemplary analytic summary 501. Analytic summary 501may be an example of the report displayed at the user interface.

Via the web interface 255, results may be annotated and shared, andadditional analytics may be requested. Users may retrieve via the webinterface 255 the results of prior analyses, and current and prioranalyses may be annotated and shared. For example, a user may provide anannotation of extracted information through web interface 255 thatprovides an identification or steps for remediation the sample. Theannotation may be transmitted to one or more other users, so that theother users can even more easily identify and/or remediate the sample.

FIG. 6 depicts an exemplary interface 600 in accordance with one or moreembodiments. Interface 600 may be presented via the web interface 255.

Alerts may inform users when the results of new analyses are available.For example, a user may be identified as having been interested in aparticular instance, type, or class of malware. Whenever a new analysisof a sample is performed, and that sample matches the particularinstance, type, or class of interested, system 200 may transmit an alertto the user when the new analysis is available. The alert may includethe timestamp for the identification, filename of the triggeringmalware, SHA1 or other unique hash for the binary, and name of the alertthat was triggered. A URL may also be provided to view any meta-data orreport information generated for the binary.

FIG. 7 depicts an exemplary architecture for implementing a computingdevice 700 in accordance with one or more embodiments, which may be usedto implement any of the devices discussed herein, or any other computersystem or computing device component thereof. It will be appreciatedthat other devices that can be used with the computing device 700, suchas a client or a server, may be similarly configured. As illustrated inFIG. 7, computing device 700 may include a bus 710, a processor 720, amemory 730, a read only memory (ROM) 740, a storage device 750, an inputdevice 760, an output device 770, and a communication interface 780.

Bus 710 may include one or more interconnects that permit communicationamong the components of computing device 700. Processor 720 may includeany type of processor, microprocessor, or processing logic that mayinterpret and execute instructions (e.g., a field programmable gatearray (FPGA)). Processor 720 may include a single device (e.g., a singlecore) and/or a group of devices (e.g., multi-core). Memory 730 mayinclude a random access memory (RAM) or another type of dynamic storagedevice that may store information and instructions for execution byprocessor 720. Memory 730 may also be used to store temporary variablesor other intermediate information during execution of instructions byprocessor 720.

ROM 740 may include a ROM device and/or another type of static storagedevice that may store static information and instructions for processor720. Storage device 750 may include a magnetic disk and/or optical diskand its corresponding drive for storing information and/or instructions.Storage device 750 may include a single storage device or multiplestorage devices, such as multiple storage devices operating in parallel.Moreover, storage device 750 may reside locally on the computing device700 and/or may be remote with respect to a server and connected theretovia network and/or another type of connection, such as a dedicated linkor channel.

Input device 760 may include any mechanism or combination of mechanismsthat permit an operator to input information to computing device 700,such as a keyboard, a mouse, a touch sensitive display device, amicrophone, a pen-based pointing device, and/or a biometric inputdevice, such as a voice recognition device and/or a finger printscanning device. Output device 770 may include any mechanism orcombination of mechanisms that outputs information to the operator,including a display, a printer, a speaker, etc.

Communication interface 780 may include any transceiver-like mechanismthat enables computing device 700 to communicate with other devicesand/or systems, such as a client, a server, a license manager, a vendor,etc. For example, communication interface 780 may include one or moreinterfaces, such as a first interface coupled to a network and/or asecond interface coupled to a license manager. Alternatively,communication interface 780 may include other mechanisms (e.g., awireless interface) for communicating via a network, such as a wirelessnetwork. In one implementation, communication interface 780 may includelogic to send code to a destination device, such as a target device thatcan include general purpose hardware (e.g., a personal computer formfactor), dedicated hardware (e.g., a digital signal processing (DSP)device adapted to execute a compiled version of a model or a part of amodel), etc.

Computing device 700 may perform certain functions in response toprocessor 720 executing software instructions contained in acomputer-readable medium, such as memory 730. In alternativeembodiments, hardwired circuitry may be used in place of or incombination with software instructions to implement features consistentwith principles of the disclosure. Thus, implementations consistent withprinciples of the disclosure are not limited to any specific combinationof hardware circuitry and software.

Depicted in FIG. 8 is one embodiment of the invention where an exemplaryworkflow is depicted. Starting at start point 805 specifies that aworkflow wherein analyzers 810, 820 830 and 840 are run simultaneouslyfrom Divergence point 807. The workflow then specifies that analyzer 850is run after analyzers 810 through 840 are completed at convergencepoint 880. Decision point 890 specifies that Analyzer 860 is run if theresults from analyzer 850 show that the sample is suspected to bemalware. At convergence point 895, the analysis workflow is complete andthe results of all the analyzers are gathered for presentation at finish897.

Exemplary embodiments may be embodied in many different ways as asoftware component. For example, it may be a stand-alone softwarepackage, a combination of software packages, or it may be a softwarepackage incorporated as a “tool” in a larger software product. It may bedownloadable from a network, for example, a web site, as a stand-aloneproduct or as an add-in package for installation in an existing softwareapplication. It may also be available as a client-server softwareapplication, or as a web-enabled software application. It may also beembodied as a software package installed on a hardware device.

Numerous specific details have been set forth to provide a thoroughunderstanding of the embodiments. It will be understood, however, thatthe embodiments may be practiced without these specific details. Inother instances, well-known operations, components and circuits have notbeen described in detail so as not to obscure the embodiments. It can beappreciated that the specific structural and functional details arerepresentative and do not necessarily limit the scope of theembodiments.

It is worthy to note that any reference to “one embodiment” or “anembodiment” means that a particular feature, structure, orcharacteristic described in connection with the embodiment is includedin at least one embodiment. The appearances of the phrase “in oneembodiment” in the specification are not necessarily all referring tothe same embodiment.

Although some embodiments may be illustrated and described as comprisingexemplary functional components or modules performing variousoperations, it can be appreciated that such components or modules may beimplemented by one or more hardware components, software components,and/or combination thereof. The functional components and/or modules maybe implemented, for example, by logic (e.g., instructions, data, and/orcode) to be executed by a logic device (e.g., processor). Such logic maybe stored internally or externally to a logic device on one or moretypes of computer-readable storage media.

Some embodiments may comprise an article of manufacture. An article ofmanufacture may comprise a storage medium to store logic. Examples of astorage medium may include one or more types of computer-readablestorage media capable of storing electronic data, including volatilememory or non-volatile memory, removable or non-removable memory,erasable or non-erasable memory, writeable or re-writeable memory, andso forth. Examples of storage media include hard drives, disk drives,solid state drives, and any other tangible storage media.

It also is to be appreciated that the described embodiments illustrateexemplary implementations, and that the functional components and/ormodules may be implemented in various other ways which are consistentwith the described embodiments. Furthermore, the operations performed bysuch components or modules may be combined and/or separated for a givenimplementation and may be performed by a greater number or fewer numberof components or modules.

Some of the figures may include a flow diagram. Although such figuresmay include a particular logic flow, it can be appreciated that thelogic flow merely provides an exemplary implementation of the generalfunctionality. Further, the logic flow does not necessarily have to beexecuted in the order presented unless otherwise indicated. In addition,the logic flow may be implemented by a hardware element, a softwareelement executed by a processor, or any combination thereof.

While various exemplary embodiments have been described above, it shouldbe understood that they have been presented by way of example only, andnot limitation. Thus, the breadth and scope of the present disclosureshould not be limited by any of the above-described exemplaryembodiments, but should instead be defined only in accordance with thefollowing claims and their equivalents.

1-20. (canceled)
 21. A computer-implemented method comprising: obtaininga particular analyzer that is trained using machine learning to classifydata samples as likely including malware or as likely not includingmalware; providing data for generating a graphical user interface at auser device, the graphical user interface being configured to receive,through selectable options, configuration data that defines auser-defined workflow to control one or more analyzers for analyzingmalware having a particular malware attribute; receiving, at a serverfrom the user device, the configuration data; storing the configurationdata in a workflow definition database, the workflow definition databaseincluding workflow definitions for a plurality of workflows respectivelyassociated with a plurality of malware attributes; receiving a sampleincluding a potential malware; determining, by the server, at least onemalware attribute of the sample; determining, by the server, that the atleast one malware attribute of the sample is associated with theparticular malware attribute; selecting, from the plurality ofworkflows, the user-defined workflow for analyzing the sample; causing,by the server, the one or more analyzers to analyze the sample accordingto the user-defined workflow associated with the stored configurationdata to generate an analysis result that indicates a likelihood that thesample includes malware or does not include malware, the one or moreanalyzers including the particular analyzer that is trained usingmachine learning; and providing the analysis result for output.
 22. Thecomputer-implemented method of claim 21, wherein determining, by theserver, at least one malware attribute of the sample comprises one ormore of: uncompressing the sample; decrypting the sample; andidentifying a file type of the sample.
 23. The computer-implementedmethod of claim 22, further comprising: selecting the particularanalyzer to analyze the sample based on the identified file type. 24.The computer-implemented method of claim 21, wherein receiving, at aserver from the user device, the configuration data comprises one ormore of: receiving order data indicative of an order in which to applythe one or more analyzers for analyzing the malware having theparticular malware attribute; and receiving compatibility dataindicating that the user-defined workflow is configured to support oneor more scripts or one or more virtual machines.
 25. Thecomputer-implemented method of claim 21, further comprising: causing, bythe server, an analyzer other than the particular analyzer to analyzethe analysis result.
 26. The computer-implemented method of claim 21,wherein causing, by the server, one or more analyzers to analyze thesample according to the user-defined workflow comprises: providing thesample to one or more environments for execution and behavioralprofiling.
 27. The computer-implemented method of claim 21, whereincausing, by the server, one or more analyzers to analyze the sampleaccording to the user-defined workflow to generate an analysis resultthat indicates a likelihood that the sample includes malware or does notinclude malware comprises: obtaining one or more rules from a database;classifying the sample as likely including malware or as likely notincluding malware based on the obtained one or more rules; andgenerating the analysis result based, in part, on the classifying.
 28. Anon-transitory computer-readable storage medium encoded with a computerprogram, the computer program comprising instructions that, uponexecution by a computer, cause the computer to perform operationscomprising: obtaining a particular analyzer that is trained usingmachine learning to classify data samples as likely including malware oras likely not including malware; providing data for generating agraphical user interface at a user device, the graphical user interfacebeing configured to receive, through selectable options, configurationdata that defines a user-defined workflow to control one or moreanalyzers for analyzing malware having a particular malware attribute;receiving, from the user device, the configuration data; storing theconfiguration data in a workflow definition database, the workflowdefinition database including workflow definitions for a plurality ofworkflows respectively associated with a plurality of malwareattributes; receiving a sample including a potential malware;determining that the particular malware attribute is an attribute of thesample; causing the one or more analyzers to analyze the sampleaccording to the user-defined workflow associated with the storedconfiguration data to generate an analysis result that indicates alikelihood that the sample includes malware or does not include malware,the one or more analyzers including the particular analyzer that istrained using machine learning; and providing the analysis result foroutput.
 29. The non-transitory computer-readable storage medium of claim28, determining that the particular malware attribute is an attribute ofthe sample comprises one or more of: uncompressing the sample;decrypting the sample; and identifying a file type of the sample andselecting the particular analyzer to analyze the sample based on theidentified file type.
 30. The non-transitory computer-readable storagemedium of claim 28, wherein receiving, from the user device, theconfiguration data comprises one or more of: receiving order dataindicative of an order in which to apply the one or more analyzers foranalyzing the malware having the particular malware attribute; andreceiving compatibility data indicating that the user-defined workflowis configured to support one or more scripts or one or more virtualmachines.
 31. The non-transitory computer-readable storage medium ofclaim 28, wherein the operations further comprise: causing an analyzerother than the particular analyzer to analyze the analysis result. 32.The non-transitory computer-readable storage medium of claim 28, whereincausing one or more analyzers to analyze the sample according to theuser-defined workflow comprises: providing the sample to one or moreenvironments for execution and behavioral profiling.
 33. Thenon-transitory computer-readable storage medium of claim 28, whereincausing one or more analyzers to analyze the sample according to theuser-defined workflow to generate an analysis result that indicates alikelihood that the sample includes malware or does not include malwarecomprises: obtaining one or more rules from a database; classifying thesample as likely including malware or as likely not including malwarebased on the obtained one or more rules; and generating the analysisresult based, in part, on the classifying.
 34. A system comprising: oneor more processors and one or more computer storage media storinginstructions that are operable and when executed by the one or moreprocessors, cause the one or more processors to perform operationscomprising: obtaining a particular analyzer that is trained usingmachine learning to classify data samples as likely including malware oras likely not including malware; providing data for generating agraphical user interface at a user device, the graphical user interfacebeing configured to receive, through selectable options, configurationdata that defines a user-defined workflow to control one or moreanalyzers for analyzing malware having a particular malware attribute;receiving, from the user device, the configuration data; storing theconfiguration data in a workflow definition database, the workflowdefinition database including workflow definitions for a plurality ofworkflows respectively associated with a plurality of malwareattributes; receiving a sample including a potential malware;determining that the particular malware attribute is an attribute of thesample; causing the one or more analyzers to analyze the sampleaccording to the user-defined workflow associated with the storedconfiguration data to generate an analysis result that indicates alikelihood that the sample includes malware or does not include malware,the one or more analyzers including the particular analyzer that istrained using machine learning; and providing the analysis result foroutput.
 35. The system of claim 34, wherein determining that theparticular malware attribute is an attribute of the sample comprises oneor more of: uncompressing the sample; decrypting the sample; andidentifying a file type of the sample.
 36. The system of claim 35,wherein the operations further comprise: selecting the particularanalyzer to analyze the sample based on the identified file type. 37.The system of claim 34, wherein receiving, from the user device, theconfiguration data comprises one or more of: receiving order dataindicative of an order in which to apply the one or more analyzers foranalyzing the malware having the particular malware attribute; andreceiving compatibility data indicating that the user-defined workflowis configured to support one or more scripts or one or more virtualmachines.
 38. The system of claim 34, wherein the operations furthercomprise: causing an analyzer other than the particular analyzer toanalyze the analysis result.
 39. The system of claim 34, wherein causingone or more analyzers to analyze the sample according to theuser-defined workflow comprises: providing the sample to one or moreenvironments for execution and behavioral profiling.
 40. The system ofclaim 34, wherein causing one or more analyzers to analyze the sampleaccording to the user-defined workflow to generate an analysis resultthat indicates a likelihood that the sample includes malware or does notinclude malware comprises: obtaining one or more rules from a database;classifying the sample as likely including malware or as likely notincluding malware based on the obtained one or more rules; andgenerating the analysis result based, in part, on the classifying.